Multilingual Projection for Parsing Truly Low-Resource Languages
نویسندگان
چکیده
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.
منابع مشابه
Cross-Lingual Parser Selection for Low-Resource Languages
In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We c...
متن کاملCross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also a...
متن کاملMultilingual Structural Projection across Interlinear Text
This paper explores the potential for annotating and enriching data for low-density languages via the alignment and projection of syntactic structure from parsed data for resource-rich languages such as English. We seek to develop enriched resources for a large number of the world’s languages, most of which have no significant digital presence. We do this by tapping the body of Web-based lingui...
متن کاملInducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks
This work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our metho...
متن کاملDelexicalized transfer parsing for low-resource languages using transformed and combined treebanks
This paper describes the IIT Kharagpur dependency parsing system in CoNLL2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the lowresource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by a delexicalization method. We have applied transformation on the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- TACL
دوره 4 شماره
صفحات -
تاریخ انتشار 2016